Draft genome assemblies of the ponerine ant Odontoponera transversa and the carpenter ant Camponotus friedae (Hymenoptera: Formicidae)

Objectives Ants are ecologically dominant insects in most terrestrial ecosystems, with more than 14,000 extant species in about 340 genera recorded to date. However, genomic resources are still scarce for most species, especially for species endemic in East or Southeast Asia, limiting the study of phylogeny, speciation and adaptation of this evolutionarily successful animal lineage. Here, we assemble and annotate the genomes of Odontoponera transversa and Camponotus friedae, two ant species with a natural distribution in China, to facilitate future study of ant evolution. Data description We obtained a total of 16 Gb and 51 Gb PacBio HiFi data for O. transversa and C. friedae, respectively, which were assembled into the draft genomes of 339 Mb for O. transversa and 233 Mb for C. friedae. Genome assessments by multiple metrics showed good completeness and high accuracy of the two assemblies. Gene annotations assisted by RNA-seq data yielded a comparable number of protein-coding genes in the two genomes (10,892 for O. transversa and 11,296 for C. friedae), while repeat annotations revealed a remarkable difference of repeat content between these two ant species (149.4 Mb for O. transversa versus 49.7 Mb for C. friedae). Besides, complete mitochondrial genomes for the two species were assembled and annotated.


Objective
Ants are group-living insects that exhibit complex social behaviors, obligate reproductive division of labor via queen-worker caste differentiation, and extended queen longevity [1].The common ancestor of all modern ants was estimated to appear in the late Jurassic, and the stem-group ants experienced species radiation during the Early Cretaceous, giving rise to the species-rich Formicidae with > 14,000 extant species in about 340 genera [2,3].Moreover, these tiny creatures have adapted to most terrestrial environments, where they are usually the most abundant insects in local ecosystems [4].The ants therefore represent a unique system to study the genetic underpinnings of species radiation and adaption as well as the evolution of developmental plasticity and longevity.This is particularly true in the genomic era, when considering that ant genomes are generally small (200-600 Mb) [5].Indeed, a large-scale genome sequencing initiative has been proposed for ant genomics, i.e., the Global Ant Genomics Alliance (GAGA) [4].And thanks to the efforts of scientists around the world, the genomes of > 120 ant species have been sequenced and assembled to date.However, species selected for genome sequencing were mainly collected in Europe and America and are under-represented in some other regions, such as East and Southeast Asia.We contemplate that the uneven collection of ant genomes, in terms of geographical representation, will limit the in-depth investigation of ant phylogeny and genomic evolution.
In this study, we would like to mitigate this unevenness by conducting whole genome sequencing for two ant species with a natural distribution in China, the ponerine ant Odontoponera transversa and the carpenter ant Camponotus friedae.O. transversa is commonly observed in Southeast Asia and South China [6,7].It is a predator that feeds on small insects, especially termites [8].It has been reported that O. transversa can use the termite trail pheromone to track the termites [9].The predation behavior of O. transversa is believed to play an important role in preventing the disaster of some pests and maintaining ecological balance [10].It is also noteworthy that O. transversa represents one of the only two extant species in the genus Odontoponera [7].In contrast, C. friedae belongs to a species-rich genus that encompasses over 1,500 species [11].The geographic range of C. friedae is mainly restricted to the eastern part of mainland China, Taiwan and Japan [6].Colonies of C. friedae are typically monogynous with a single queen, and the workers are polymorphic with major and minor workers [12].The two genome assemblies obtained in this study represents the first reference genome for the genus Odontoponera, and the fifth one for the species-rich genus Camponotus.Therefore, we anticipate that these genomic resources will provide valuable genetic resources for understanding the biology of Odontoponera and Camponotus, and also facilitate future study of ant phylogeny and evolution.

Data description
The samples of O. transversa were collected from woodland near Yangmei Middle School (21°32′57.26″N,110°36′39.71″E),Huazhou City, Guangdong Province in March 2020.The C. friedae samples were collected from the mountain adjacent to Shantang village (26°13′17.04″N,119°34′12.44″E),Fuzhou City, Fujian Province, also in March 2020.After collection, the ants were immediately transferred to the lab at Kunming Institute of Zoology, Yunnan, China, where they were frozen in liquid nitrogen and subsequently stored at -80 °C.The samples were carefully packaged with dry ice and sent to Novogene Corporation (Tianjin, China) for DNA extraction and genome sequencing.
For each species, genomic DNA (gDNA) was extracted from a pool of multiple individuals with a sodium dodecyl sulfate (SDS) based method implemented by Novogene.Specifically, eight O. transversa workers and 10 C. friedae gynes were pooled respectively before DNA extraction.Then gDNA was fragmented to a target size of approximately 15 kb and subjected to the construction of PacBio HiFi SMRTbell libraries with the SMRTbell Express Template Prep Kit 2.0 according to the manufacturer's instructions (Pacific Biosciences, CA, USA).The HiFi reads were produced using the circular consensus sequencing (CCS) mode on the PacBio Sequel II System (Table 1, Data set 1).We obtained a total of 16 Gb and 51 Gb HiFi reads for O. transversa and C. friedae, respectively (Data file 1).
In addition, we collected RNA-seq data to assist gene annotation with the remaining samples after genome sequencing.Total RNA was extracted from the whole bodies of two O. transversa gynes, 15 C. friedae small workers and 10 C. friedae middle workers by the Trizol method.Then, the three RNA samples were subjected to RNA-seq library construction and paired-end sequencing with the DNBSEQ-T1 system at China National GeneBank (Shenzhen, China).Finally, we obtained 42.2 Gb and 43.2 Gb of RNA-seq data for the small workers and middle workers of C. friedae, respectively, and 38.0 Gb for the gynes of O. transversa (Table 1, Data file 2).
The PacBio HiFi reads were assembled by Wtdbg2 (v2.5) [13] with the mode optimal for HiFi data (parameters: -x ccs), which yielded a draft genome assembly of 339 Mb for O. transversa and 233 Mb for C. friedae, respectively (Table 1, Data set 3).The O. transversa assembly comprised 6,442 contigs with an N50 length of 101.7 kb and a GC content of 41%, while the C. friedae assembly contained 3,302 contigs with an N50 length of 159.7 kb and a GC content of 35% (Data file 3).Minimap2 (v2.1) [14] was used to align the HiFi reads to the assembled genomes, which reported an alignment rate of 99.5% for O. transversa and 94.3% for C. friedae.BUSCO (v5.3.2) [15] assessment based on the hymenopteran gene database revealed that both genome assemblies had a completeness score over 90% (Data file 4).The consensus quality value (QV) assessed by Merqury (v1.3) [16] with the HiFi data was 38.3 for O. transversa and 46.8 for C. friedae, respectively.In addition, more than 99% of the genomic positions in the two assemblies were covered by at least three HiFi reads and ~ 98% covered by at least five HiFi reads (Data file 5).Taken together, these metrics support a good completeness and high accuracy of the O. transversa and C. friedae genome assemblies.
Protein-coding genes were predicted using GeMoMa (v1.9), which utilizes homologous and RNA-seq evidence to accurately predict gene models [17].Specifically, RNAseq reads were first aligned to the genome using Hisat2 (v2.2.1) [18], followed by reference-based transcriptome assembly using StringTie2 (v2.1.4)[19] and open reading frame prediction using TransDecoder (v5.7.1) [20].Then the transcriptome-derived gene models of O. transversa and C. friedae were combined with the gene models of C. floridanus, Atta cephalotes, Ooceraea biroi, Nasonia vitripennis and Tribolium castaneum to serve as homologous evidence for GeMoMa.In the meanwhile, RNA-seq derived splice junctions from Hisat2 alignments were applied by GeMoMa to refine the exon-intron boundaries (Table 1, Data set 4,5).In total, 10,892 and 11,296 protein-coding genes were identified in the genomes of O. transversa and C. friedae, respectively.BUSCO assessment with the hymenopteran gene database reported a completeness score around 90% for both gene sets (Data file 6).In addition, homologous searches against databases of InterPro, UniProtKB, NCBI nr, and KEGG could assign putative functional annotations for more than 95% of the protein-coding genes (Data set 6,7).
The mitochondrial genomes of the two ants were assembled by MitoZ (v2.2) [24].Mitochondrial gene annotation was carried out using the annotate function of MitoZ (--clade Arthropoda) and the online server of MITOS2 (--code Invertebrate (5), --refseqver RefSeq63 Metazoa) [25], followed by manual check of each gene locus.The total lengths of the O. transversa and C. friedae mitochondrial genome assemblies were 16.1 kb and 18.8 kb, respectively, with all the expected mitochondrial genes (13 protein-coding genes, 22 tRNA genes, and 2 rRNA genes) identified (Table 1, Data set 10,11).

Limitations
The genome assemblies of the two ant species are still fragmented, which disable the study of chromosomelevel rearrangements or structural variation.The incorporation of Hi-C sequencing data to achieve chromosome-level assemblies is expected to overcome this limitation in the future.In addition, the collection of transcriptome data from more castes and developmental stages are required to further improve the gene annotations.Nevertheless, regardless of these limitations, we anticipate that the draft genome assemblies together with the genome annotations generated in this study are valuable for phylogenomics and comparative genomics of ants and hymenopteran insects.